Code
import pandas as pd
df= pd.read_csv("q1data.csv")x: income
y: life expectancy
bubble sizes: opulation size with
vertical translucent lines corresponding to income on the x-axis with a scale of 500: income level categories (1/2/3/4)
color: different countries are shown with different colours based on the continent
watermark: defines the year of the data
import pandas as pd
df= pd.read_csv("q1data.csv")import plotnine as p9
four_region_colors = {
'asia': '#ff5872',
'europe': '#ffec33',
'africa': '#00d5e9',
'americas': '#99ef33'
}
(p9.ggplot(df,
p9.aes(x='income', y='life_exp'))
+ p9.geom_point(p9.aes(fill='four_regions', size='population'),alpha=1,color='black', stroke= 0.2)
+ p9.scale_x_log10(limits=(0,128000),
breaks=[500 , 1000, 2000, 4000, 8000, 16000, 32000, 64000],
labels=['500','1000','2000','4000','8000','16k','32k','64k']
)
+ p9.scale_y_continuous(limits=(20,90),
breaks =[20,30,40,50,60,70,80,90],
labels= ['20','30','40','50','60','70','80','90']
)
+ p9.theme_bw()
+ p9.theme(
panel_grid_major=p9.element_line(color='#dddddd', alpha=0.3, size=0.5),
panel_grid_minor=p9.element_line(color='#eeeeee', alpha=0.25, size=0.3),
axis_ticks_major_x=p9.element_blank(),
axis_ticks_major_y=p9.element_blank(),
axis_text_x=p9.element_text(color='black', alpha=0.4),
axis_text_y=p9.element_text(color='black', alpha=0.4),
axis_title_x=p9.element_text(color='black', alpha=0.6),
axis_title_y=p9.element_text(color='black', alpha=0.6),
panel_border=p9.element_rect(alpha=0.5),
figure_size=(7,4))
+ p9.annotate(geom='text', label="2010", x=6000, y=50, size=140, color='grey', alpha=0.2, ha='center', va='center')
+ p9.scale_fill_manual(values=four_region_colors)
+ p9.labs(x="Income", y= "Life Expectancy" )
+ p9.scale_size(
range=[0.2, 16]
)
)/opt/anaconda3/lib/python3.13/site-packages/mizani/transforms.py:374: RuntimeWarning: divide by zero encountered in log10
/opt/anaconda3/lib/python3.13/site-packages/plotnine/layer.py:372: PlotnineWarning: geom_point : Removed 4 rows containing missing values.
import plotnine as p9
(p9.ggplot(df, p9.aes(x='income', y='life_exp', fill='four_regions'))
+ p9.geom_boxplot(alpha=0.3)
+ p9.scale_y_continuous(
breaks= [20,30,40,50,60,70,80,90])
+p9.theme_bw()
+p9.scale_fill_manual(values=four_region_colors)
)/opt/anaconda3/lib/python3.13/site-packages/plotnine/layer.py:293: PlotnineWarning: stat_boxplot : Removed 2 rows containing non-finite values.
There are multiple other graphs that take inputs from 4 variables like geom_violin plot, similar to boxplot, to show the trends of different levels of income and thier corresonding life expectacy, in different regions of the world and different population.
x axis: Exports
y axis: Imports
watermark: Year of data
bubble size: Energy use amount in different countries
bubble colour: corresponding to 4 different regions of the world
df2= pd.read_csv("q2_data.csv")four_region_colors = {
'asia': '#ff5872',
'europe': '#ffec33',
'africa': '#00d5e9',
'americas': '#99ef33'
}
(p9.ggplot(df2, p9.aes(
x='exports', y='imports'
))
+p9.geom_point(p9.aes(fill='four_regions', size='energy'), alpha=1, color='black', stroke=0.2)
+ p9.scale_x_continuous(limits=(0,240),
breaks=[20,40,60,80,100,120,140,160,180,200,220],
labels=['20','40','60','80','100','120','140','160','180','200','220']
)
+ p9.scale_y_continuous(limits=(0,450),
breaks=[50,100,150,200,250,300,350,400],
labels=['50','100','150','200','250','300','350','400']
)
+ p9.theme_bw()
+ p9.theme(
panel_grid_major=p9.element_line(color='#dddddd', alpha=0.3, size=0.5),
panel_grid_minor=p9.element_line(color='#eeeeee', alpha=0.25, size=0.3),
axis_ticks_major_x=p9.element_blank(),
axis_ticks_major_y=p9.element_blank(),
axis_text_x=p9.element_text(color='black', alpha=0.4),
axis_text_y=p9.element_text(color='black', alpha=0.4),
axis_title_x=p9.element_text(color='black', alpha=0.6),
axis_title_y=p9.element_text(color='black', alpha=0.6),
panel_border=p9.element_rect(alpha=0.5),
figure_size=(7,4)
)
+ p9.annotate(geom='text', label='1997', x=120, y=220, size=140, color='grey', alpha =0.2, ha='center', va='center' )
+ p9.scale_fill_manual(values=four_region_colors)
+ p9.labs(x="Exports(\%GDP)", y= "Imports(\%GDP)" )
+ p9.scale_size(
range=[0.2,15])
)/opt/anaconda3/lib/python3.13/site-packages/plotnine/layer.py:372: PlotnineWarning: geom_point : Removed 78 rows containing missing values.
(p9.ggplot(df2, p9.aes(
x='imports', y='exports',fill='four_regions'))
+p9.geom_violin(alpha=0.5)
+p9.theme_bw()
+p9.scale_fill_manual(values=four_region_colors)
)/opt/anaconda3/lib/python3.13/site-packages/plotnine/layer.py:293: PlotnineWarning: stat_ydensity : Removed 35 rows containing non-finite values.
/opt/anaconda3/lib/python3.13/site-packages/plotnine/positions/position.py:232: PlotnineWarning: position_dodge requires non-overlapping x intervals
x: individuals using internet, with scale of 10 y: GDP/capita bubble size: corresponding to income size bubble colour: corresponding to four_regions watermark: year of data(2001)
import pandas as pd
df3 = pd.read_csv("q3data.csv")four_region_colors = {
'asia': '#ff5872',
'europe': '#ffec33',
'africa': '#00d5e9',
'americas': '#99ef33'
}
(p9.ggplot(df3, p9.aes(
x= 'internet_users', y='gdp', fill='four_regions'
))
+p9.geom_point(p9.aes(alpha='0.8',fill='four_regions', size='income'), color='black', stroke=0.1)
+p9.theme_bw()
+ p9.theme(
panel_grid_major=p9.element_line(color='#dddddd', alpha=0.3, size=0.5),
panel_grid_minor=p9.element_line(color='#eeeeee', alpha=0.25, size=0.3),
axis_text_x=p9.element_text(color='black', alpha=0.3),
axis_text_y=p9.element_text(color='black', alpha=0.3),
axis_title_x=p9.element_text(color='black', alpha=0.6),
axis_title_y=p9.element_text(color='black', alpha=0.6),
panel_border=p9.element_rect(color='black', alpha=0.3),
axis_ticks_major_x=p9.element_blank(),
axis_ticks_major_y=p9.element_blank(),
figure_size=(7,4)
)
+p9.scale_x_continuous(limits=(0,100),
breaks=[10,20,30,40,50,60,70,80,90],
labels=['10','20','30','40','50','60','70','80','90'])
+p9.scale_y_log10(
breaks=[200,500,1000,2000,5000,10000,20000,50000,100000],
labels=['200','500','1000','2000','5000','10k','20k','50k','100k']
)
+p9.scale_fill_manual(values=four_region_colors)
+p9.annotate(geom='text', label='2001', x=55, y=4000, size=140, alpha=0.1, va='center', ha='center')
+ p9.labs(x='Individuals using internet',y='GDP/Capita')
+ p9.scale_size(
[0.1,15]
)
)/opt/anaconda3/lib/python3.13/site-packages/plotnine/layer.py:372: PlotnineWarning: geom_point : Removed 32 rows containing missing values.
(p9.ggplot(df3, p9.aes(
x='internet_users', y='gdp', fill= 'four_regions'))
+p9.geom_violin(alpha=0.5)
+p9.theme_bw()
+p9.scale_fill_manual(values=four_region_colors)
)/opt/anaconda3/lib/python3.13/site-packages/plotnine/layer.py:293: PlotnineWarning: stat_ydensity : Removed 25 rows containing non-finite values.
/opt/anaconda3/lib/python3.13/site-packages/plotnine/positions/position.py:232: PlotnineWarning: position_dodge requires non-overlapping x intervals
The violin plot shows us the same data as bubble plot, or what boxplot would give us, but violin plot does a better job in visualizing the spread of GDP among different regions.
AI(ChatGPT 5.0 and Perplixity AI was(Study: Step-by-Step Learning)) was used to complete this project, specifically in the following ways:
Looking up solutions to fix errors.
Brain-storming about usecase of different graphs.
Looking up correct syntax for functions
Understanding the use case of different functoins in plotnine.
References:
https://www.data-to-viz.com/ for graphing ideas for part 4 of the tasks.
https://plotnine.org/reference/ to look up correct syntax for functions.